Back-Translation for Discovering Distant Protein Homologies
نویسندگان
چکیده
Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins’ common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. To cope with this situation, we propose a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. This allows us to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.
منابع مشابه
Primary structure of human C-reactive protein.
The complete amino acid sequence of human C-reactive protein has been established. Distant homologies to C3 homology region in the CH2 domain of IgG and to C3a anaphylotoxin have been noted. No homology to other immunoglobulin homology regions or to the same homology region in other heavy chains was observed. The previously reported homologies between rabbit and human C-reactive protein and pro...
متن کاملThe Expression of FOXE-1 and STIP-1 in Papillary Thyroid Carcinoma and Their Relationship with Patient Prognosis
Background and Objective: Most patients with papillary carcinoma of the thyroid gland (PTC) have favorable outcome, but since it has severe capability to invade the nearby tissues, there is a great risk of regional and distal lymph-nodes (LNs) metastases related to poor prognostic parameters, early recurrences, and distant metastasis that lead to bad patient outcome. ...
متن کاملOptimization of a new score function for the detection of remote homologs.
The growth in protein sequence data has placed a premium on ways to infer structure and function of the newly sequenced proteins. One of the most effective ways is to identify a homologous relationship with a protein about which more is known. While close evolutionary relationships can be confidently determined with standard methods, the difficulty increases as the relationships become more dis...
متن کاملFeatures Extraction For Protein Homology Detection Using Hidden Markov Models Combining Scores
Few years back, Jaakkola and Haussler published a method of combining generative and discriminative approaches for detecting protein homologies. The method was a variant of support vector machines using a new kernel function called Fisher Kernel. They begin by training a generative hidden Markov model for a protein family. Then, using the model, they derive a vector of features called Fisher sc...
متن کاملDiscovering Domains Mediating Protein Interactions
Background: Protein-protein interactions do not provide any direct information regarding the domains within the proteins that mediate the interactions. The majority of proteins are multi domain proteins and the interaction between them is often defined by the pairs of their domains. Most of the former studies focus only on interacting domain pairs. However they do not consider the in...
متن کامل